Working with Korean Diplomatic Datasets

Introducing the iso3c_kr Function in the kdiplo R Package for Converting Korean Country Names into iso3c Country Codes

Korea
Korean dataset
dataset
R
R package
diplomacy
kdiplo
Author

Kadir Jun Ayhan

Published

Tuesday, April 23, 2024

In my research, I often work with country-year data from Korean sources, including data on diplomatic visits, trade, aid and so on. One of the fundamental difficulties I have had is the lack of universal country codes across different datasets. Further complicating matters is the inconsistency of country names in these datasets. For example, Democratic Republic of the Congo has five different spellings across different official sources that I could find: 콩고 민주공화국, 자이르, 콩고민주공화국, 콩고 민주 공화국, 콩고민주공화국(DR콩고).

To address this issue, I have created a function in my kdiplo package that converts Korean country names into ISO 3166-1 alpha-3 (iso3c) country codes. This function, iso3c_kr, is designed to assign universal iso3c country codes to Korean-language country names that will make it easier to join different kinds of data.

One still needs to check if the output is correct, especially for countries that have gone through political transitions such as Germany, Yugoslavia, Russia, Vietnam, Yemen and so on.

Sometimes the Korean government sources have overlapping data for Yugoslavia and Serbia, for example. In such cases, one needs to check the data and make sure that the data is correct.

For example, the following is sample Korean trade data from Korean Statistical Information Service (KOSIS):

Show the code
trade <- readxl::read_xlsx("../../../../korea_visits/data/kosis_trade_240330.xlsx")

trade[533:538,c(1,57:62)] %>% gt::gt()
국가별 2018 년 2019 년 2020 년 2021 년 2022 년 2023 년
잠비아 26241 16087 17619 28356 14068 15459
잠비아 108344 54542 15164 100606 82198 53867
자이르 NA NA NA NA NA NA
자이르 618 8 113 4 NA NA
짐바브웨 25964 14088 15514 20404 16083 19563
짐바브웨 4909 13098 11377 9627 10415 20862

And, the following is sample Korean aid data from Korea’s ODA portal:

Show the code
aid <- readxl::read_xlsx("../../../../covid determinants 220818/data/korea_total_aid_2019_230709.xlsx")

aid %<>% select(1:5)

aid[c(50, 150, 250, 350, 450),] %>% gt::gt()
country_kr sector 사업개수 약정액[USD] 약정액[KRW]
베트남 통신정책, 계획 및 행정(voluntary code) 2 232334 270736486
캄보디아 11321 1 85815 99999361
미얀마 사회보호/보장 1 103460 120560903
라오스 비정규 농업훈련 1 107958 125802378
몽골 의료서비스 5 511824 596423389

Wide format is quite common in official Korean data sources. Trade data is in wide format. Before using the iso3c_kr function, let’s first transform the trade data into a long (country-year) format to make it in the same format as the aid data. This will make joining the two datasets more feasible.

Show the code
export <- trade
import <- trade

export %<>% select(-`...63`)
export_long <- export %>% pivot_longer(4:62, names_to = "year", values_to = "export_kosis")

export_long %<>% set_names(c("country_kr", "type", "unit", "year", "export_kosis"))


export_long %<>% filter(type == "수출액[천달러]") %>% 
  mutate(export_kosis = as.numeric(export_kosis) * 1000,
         year = parse_number(year)) %>%
  select(-type, -unit)





import %<>% select(-`...63`)
import_long <- import %>% pivot_longer(4:62, names_to = "year", values_to = "import_kosis")

import_long %<>% set_names(c("country_kr", "type", "unit", "year", "import_kosis"))


import_long %<>% filter(type == "수입액[천달러]") %>% 
  mutate(import_kosis = as.numeric(import_kosis) * 1000,
         year = parse_number(year)) %>%
  select(-type, -unit)

trade <- export_long %>% left_join(import_long, by = c("country_kr", "year"))

Using the iso3c_kr function, we can simply convert Korean country names into iso3c country codes. For example, the following is the output of the iso3c_kr function for the Korean trade data:

Show the code
trade <- iso3c_kr(trade, "country_kr") #you copy paste the column name that has the Korean country names.


trade[c(50, 150, 250, 350, 450, 550), c(1,5, 2:4)] %>% gt::gt()
country_kr iso3c year export_kosis import_kosis
NA 2014 572664607000 525514506000
아랍에미리트 연합 ARE 1996 1377933000 2259205000
앤티가바부다 ATG 1978 NA NA
앵귈라 AIA 2019 817000 1000
아르메니아 ARM 2001 1255000 43000
앙골라 AGO 1983 235000 NA

We see that in this example, “계” (gyae) did not get any iso3c country code. This is because the iso3c_kr function could not find the iso3c country code for this entry. This is because, it is not a country name. “계” means total. It is best to check the data to see which entries did not get an iso3c code.

Show the code
missing_iso3c <- trade %>% filter(is.na(iso3c)) %>% pull(country_kr) %>% unique()

paste(missing_iso3c, collapse = ", ")
[1] "계, 국제통화기금, 기타, 기타국"

They mean “total”, “IMF”, “other”, and “other countries” in Korean. In other words, we are not missing any countries, which is good.

Now let’s convert the Korean country names in the aid data into iso3c country codes:

Show the code
aid %<>% set_names(c("country_kr", "sector", "no_of_projects", "aid_usd", "aid_krw"))

aid <- iso3c_kr(aid, "country_kr") #you copy paste the column name that has the Korean country names.


aid[c(50, 150, 250, 350, 450, 550),c(1, 6, 2:5)] %>% gt::gt()
country_kr iso3c sector no_of_projects aid_usd aid_krw
베트남 VNM 통신정책, 계획 및 행정(voluntary code) 2 232334 270736486
캄보디아 KHM 11321 1 85815 99999361
미얀마 MMR 사회보호/보장 1 103460 120560903
라오스 LAO 비정규 농업훈련 1 107958 125802378
몽골 MNG 의료서비스 5 511824 596423389
필리핀 PHL 농업용수자원 2 0 0

Once you know the iso3c country codes, you can get the English country names, or other country codes (such as Correlates of War country codes) using the countrycode package, for example.

Show the code
trade <- trade %>% mutate(country_name = countrycode::countrycode(iso3c, origin = "iso3c", destination = "country.name"))

trade[c(50, 150, 250, 350, 450, 550),c(1, 5, 6, 2:4)] %>% gt::gt()
country_kr iso3c country_name year export_kosis import_kosis
NA NA 2014 572664607000 525514506000
아랍에미리트 연합 ARE United Arab Emirates 1996 1377933000 2259205000
앤티가바부다 ATG Antigua & Barbuda 1978 NA NA
앵귈라 AIA Anguilla 2019 817000 1000
아르메니아 ARM Armenia 2001 1255000 43000
앙골라 AGO Angola 1983 235000 NA

More importantly, this function allows users to be able to join different datasets that have Korean country names. For example, one can join the trade data with the aid data using the iso3c country codes. In this example, I will join the trade data with the aid data using the iso3c country codes.

Show the code
# now that I think about it, this sample data is only 2019.

aid$year <- 2019

trade_aid <- trade %>% left_join(aid, by = c("iso3c", "year"), suffix = c("", "_aid"))

trade_aid %>%
  filter(year == 2019 & !is.na(iso3c)) %>%
  slice(c(50, 150, 250, 350, 450, 550)) %>%
  select(c(1, 5, 6, 2:4, 8, 10)) %>%
  gt::gt()
country_kr iso3c country_name year export_kosis import_kosis sector aid_usd
아르메니아 ARM Armenia 2019 12729000 16743000 전문대,대학(원) 교육 119069
방글라데시 BGD Bangladesh 2019 1282342000 404703000 건설정책 및 행정관리 46251
볼리비아 BOL Bolivia 2019 30434000 450576000 환경정책 및 행정관리 80969
코트디부아르 CIV Côte d’Ivoire 2019 136494000 5264000 교육정책 및 행정관리 30096
콜롬비아 COL Colombia 2019 1143075000 718214000 성인 기초생활교육 62976
알제리 DZA Algeria 2019 700918000 1746239000 레크리에이션 및 스포츠(voluntary code) 22312

Voilà! Now we have a dataset that has both trade and aid data, both of which originally did not have consistent country names or country codes. I plan to add warning messages to the iso3c_kr function to make it easier to spot potential issues with the conversion of Korean country names. I will continue to update the Korean country name dataset in the kdiplo package as I come across new data sources. Feel free to report unavailable country names in the iso3c_kr function to me using the issue tracker on Github.